physical concept
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding
Yu, Mo, Liu, Lemao, Wu, Junjie, Chung, Tsz Ting, Zhang, Shunchi, Li, Jiangnan, Yeung, Dit-Yan, Zhou, Jie
In a systematic way, we investigate a widely asked question: Do LLMs really understand what they say?, which relates to the more familiar term Stochastic Parrot. To this end, we propose a summative assessment over a carefully designed physical concept understanding task, PhysiCo. Our task alleviates the memorization issue via the usage of grid-format inputs that abstractly describe physical phenomena. The grids represents varying levels of understanding, from the core phenomenon, application examples to analogies to other abstract patterns in the grid world. A comprehensive study on our task demonstrates: (1) state-of-the-art LLMs, including GPT-4o, o1 and Gemini 2.0 flash thinking, lag behind humans by ~40%; (2) the stochastic parrot phenomenon is present in LLMs, as they fail on our grid task but can describe and recognize the same concepts well in natural language; (3) our task challenges the LLMs due to intrinsic difficulties rather than the unfamiliar grid format, as in-context learning and fine-tuning on same formatted data added little to their performance.
Discover Physical Concepts and Equations with Machine Learning
Li, Bao-Bing, Gu, Yi, Wu, Shao-Feng
Machine learning can uncover physical concepts or physical equations when prior knowledge from another one is available. However, in many cases, these two aspects are coupled and cannot be discovered independently. We extend SciNet, which is a neural network architecture that simulates the human physical reasoning process for physics discovery, by proposing a model that combines Variational Autoencoders (VAEs) with Neural Ordinary Differential Equations (Neural ODEs). This allows us to simultaneously discover physical concepts and governing equations from simulated experimental data across diverse physical systems. We apply the model to several key examples inspired by the history of physics, including Copernicus' heliocentric solar system, Newton's law of universal gravitation, the wave function together with the Schr\"odinger equation, and spin-1/2 along with the Pauli equation. The results demonstrate that the neural network successfully reconstructs the corresponding theories.
ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models
Huang, Siyuan, Ponomarenko, Iaroslav, Jiang, Zhengkai, Li, Xiaoqi, Hu, Xiaobin, Gao, Peng, Li, Hongsheng, Dong, Hao
The integration of Multimodal Large Language Models (MLLMs) with robotic systems has significantly enhanced the ability of robots to interpret and act upon natural language instructions. Despite these advancements, conventional MLLMs are typically trained on generic image-text pairs, lacking essential robotics knowledge such as affordances and physical knowledge, which hampers their efficacy in manipulation tasks. To bridge this gap, we introduce ManipVQA, a novel framework designed to endow MLLMs with Manipulation-centric knowledge through a Visual Question-Answering format. This approach not only encompasses tool detection and affordance recognition but also extends to a comprehensive understanding of physical concepts. Our approach starts with collecting a varied set of images displaying interactive objects, which presents a broad range of challenges in tool object detection, affordance, and physical concept predictions. To seamlessly integrate this robotic-specific knowledge with the inherent vision-reasoning capabilities of MLLMs, we adopt a unified VQA format and devise a fine-tuning strategy that preserves the original vision-reasoning abilities while incorporating the new robotic insights. Empirical evaluations conducted in robotic simulators and across various vision task benchmarks demonstrate the robust performance of ManipVQA. Code and dataset will be made publicly available at https://github.com/SiyuanHuang95/ManipVQA.
Can Language Models Understand Physical Concepts?
Li, Lei, Xu, Jingjing, Dong, Qingxiu, Zheng, Ce, Liu, Qi, Kong, Lingpeng, Sun, Xu
Language models~(LMs) gradually become general-purpose interfaces in the interactive and embodied world, where the understanding of physical concepts is an essential prerequisite. However, it is not yet clear whether LMs can understand physical concepts in the human world. To investigate this, we design a benchmark VEC that covers the tasks of (i) Visual concepts, such as the shape and material of objects, and (ii) Embodied Concepts, learned from the interaction with the world such as the temperature of objects. Our zero (few)-shot prompting results show that the understanding of certain visual concepts emerges as scaling up LMs, but there are still basic concepts to which the scaling law does not apply. For example, OPT-175B performs close to humans with a zero-shot accuracy of 85\% on the material concept, yet behaves like random guessing on the mass concept. Instead, vision-augmented LMs such as CLIP and BLIP achieve a human-level understanding of embodied concepts. Analysis indicates that the rich semantics in visual representation can serve as a valuable source of embodied knowledge. Inspired by this, we propose a distillation method to transfer embodied knowledge from VLMs to LMs, achieving performance gain comparable with that by scaling up the parameters of LMs 134x. Our dataset is available at \url{https://github.com/TobiasLee/VEC}
Engineers use psychology, physics, and geometry to make robots more intelligent
Robots are all around us, from drones filming videos in the sky to serving food in restaurants and diffusing bombs in emergencies. Slowly but surely, robots are improving the quality of human life by augmenting our abilities, freeing up time, and enhancing our personal safety and well-being. While existing robots are becoming more proficient with simple tasks, handling more complex requests will require more development in both mobility and intelligence. Columbia Engineering and Toyota Research Institute computer scientists are delving into psychology, physics, and geometry to create algorithms so that robots can adapt to their surroundings and learn how to do things independently. This work is vital to enabling robots to address new challenges stemming from an aging society and provide better support, especially for seniors and people with disabilities.
Causal conditional hidden Markov model for multimodal traffic prediction
Zhao, Yu, Deng, Pan, Liu, Junting, Jia, Xiaofeng, Wang, Mulan
Multimodal traffic flow can reflect the health of the transportation system, and its prediction is crucial to urban traffic management. Recent works overemphasize spatio-temporal correlations of traffic flow, ignoring the physical concepts that lead to the generation of observations and their causal relationship. Spatio-temporal correlations are considered unstable under the influence of different conditions, and spurious correlations may exist in observations. In this paper, we analyze the physical concepts affecting the generation of multimode traffic flow from the perspective of the observation generation principle and propose a Causal Conditional Hidden Markov Model (CCHMM) to predict multimodal traffic flow. In the latent variables inference stage, a posterior network disentangles the causal representations of the concepts of interest from conditional information and observations, and a causal propagation module mines their causal relationship. In the data generation stage, a prior network samples the causal latent variables from the prior distribution and feeds them into the generator to generate multimodal traffic flow. We use a mutually supervised training method for the prior and posterior to enhance the identifiability of the model. Experiments on real-world datasets show that CCHMM can effectively disentangle causal representations of concepts of interest and identify causality, and accurately predict multimodal traffic flow.
On the Learnability of Physical Concepts: Can a Neural Network Understand What's Real?
Achille, Alessandro, Soatto, Stefano
We revisit the classic signal-to-symbol barrier in light of the remarkable ability of deep neural networks to generate realistic synthetic data. DeepFakes and spoofing highlight the feebleness of the link between physical reality and its abstract representation, whether learned by a digital computer or a biological agent. Starting from a widely applicable definition of abstract concept, we show that standard feed-forward architectures cannot capture but trivial concepts, regardless of the number of weights and the amount of training data, despite being extremely effective classifiers. On the other hand, architectures that incorporate recursion can represent a significantly larger class of concepts, but may still be unable to learn them from a finite dataset. We qualitatively describe the class of concepts that can be "understood" by modern architectures trained with variants of stochastic gradient descent, using a (free energy) Lagrangian to measure information complexity. Even if a concept has been understood, however, a network has no means of communicating its understanding to an external agent, except through continuous interaction and validation. We then characterize physical objects as abstract concepts and use the previous analysis to show that physical objects can be encoded by finite architectures. However, to understand physical concepts, sensors must provide persistently exciting observations, for which the ability to control the data acquisition process is essential (active perception). The importance of control depends on the modality, benefiting visual more than acoustic or chemical perception. Finally, we conclude that binding physical entities to digital identities is possible in finite time with finite resources, solving in principle the signal-to-symbol barrier problem, but we highlight the need for continuous validation.
DeepMind Gave an AI 'Intuition' by Training It Like a Baby
Babies are bubbly, cuddly, giggly balls of joy. At three months old, they already have intuition about how things around them behave--without anyone explicitly teaching them the rules of the game. This ability, dubbed "intuitive physics," seems extremely trivial on the surface. If I fill a glass with water and set it on the table, I know that the glass is an object--something I can wrap my hands around without it melting into my palms. And if it started levitating, I'd stare then immediately run out the door. Babies rapidly develop this ability by soaking up data from their external environments, forming a sort of "common sense" about the dynamics of the physical world.
Learning Physical Concepts in Cyber-Physical Systems: A Case Study
Steude, Henrik S., Windmann, Alexander, Niggemann, Oliver
Machine Learning (ML) has achieved great successes in recent decades, both in research and in practice. In Cyber-Physical Systems (CPS), ML can for example be used to optimize systems, to detect anomalies or to identify root causes of system failures. However, existing algorithms suffer from two major drawbacks: (i) They are hard to interpret by human experts. (ii) Transferring results from one systems to another (similar) system is often a challenge. Concept learning, or Representation Learning (RepL), is a solution to both of these drawbacks; mimicking the human solution approach to explain-ability and transfer-ability: By learning general concepts such as physical quantities or system states, the model becomes interpretable by humans. Furthermore concepts on this abstract level can normally be applied to a wide range of different systems. Modern ML methods are already widely used in CPS, but concept learning and transfer learning are hardly used so far. In this paper, we provide an overview of the current state of research regarding methods for learning physical concepts in time series data, which is the primary form of sensor data of CPS. We also analyze the most important methods from the current state of the art using the example of a three-tank system. Based on these concrete implementations1, we discuss the advantages and disadvantages of the methods and show for which purpose and under which conditions they can be used.
Physics can assist with key challenges in artificial intelligence
Current research and applications in the field of artificial intelligence (AI) include several key challenges. These include: (a) A priori estimation of the required dataset size to achieve a desired test accuracy. For example, how many handwritten digits does a machine have to learn before being able to predict a new one with a success rate of 99%? Similarly, how many specific types of circumstances does an autonomous vehicle have to learn before its reaction will not lead to an accident? This type of realization of fast on-line decision making is representative of many aspects of human activity, robotic control and network optimization.